Weblate is growing quite well in last months, but sometimes it's development is really driven by people who complain instead of following some roadmap with higher goals. I think it's time to change it at least a little bit. In order to get broader feedback I've sent out short survey to active project owners in Hosted Weblate week ago. I've decided to target at smaller audience for now, though publicly open survey might follow later (but it's always harder to evaluate feedback across different user groups). Overall feelings were really positive, most people find Weblate better than other similar services they have used. This is really something I like to hear :-). Weblate overall experience

But the most important part for me was where users want to see improvements. This somehow matches my expectation that we really should improve the user interface. Weblate future development

We have quite a lot features, which are really hidden in the user interface. Also interface for some of the features is far from being intuitive. This all probably comes from the fact that we really don't have anybody experienced with creating user interfaces right now. It's time to find somebody who will help us. In case you are able to help or know somebody who might be interested in helping, please get in touch. Weblate is free software, but this can still be paid job. Last part of the survey was focused on some particular features, but the outcome was not as clear as I hoped for as almost all feature group attracted about same attention (with one exception being extending the API, which was not really wanted by most of the users). Overall I think doing some survey like this is useful and I will certainly repeat it (probably yearly or so), to see where we're moving and what our users want. Having feedback from users is important for every project and this seemed to worked quite well. Anyway if you have further feedback, don't hesitate to use our issue tracker at GitHub or contact me directly.

Filed under: Debian English phpMyAdmin SUSE Weblate 0 comments

Weblate 2.12 has been released today, few days behind schedule. It brings improved screenshots management, better search and replace features or improved import. Many of the new features were already announced in previous post, where you can find more details about them. Full list of changes:

Improved admin interface for groups.
Added support for Yandex Translate API.
Improved speed of sitewide search.
Added project and component wide search.
Added project and component wide search and replace.
Improved rendering of inconsistent translations.
Added support for opening source files in local editor.
Added support for configuring visual keyboard with special characters.
Improved screenshot management with OCR support for matching source strings.
Default commit message now includes translation information and URL.
Added support for Joomla translation format.
Improved reliability of import across file formats.

If you are upgrading from older version, please follow our upgrading instructions. You can find more information about Weblate on https://weblate.org, the code is hosted on Github. If you are curious how it looks, you can try it out on demo server. You can login there with demo account using demo password or register your own user. Weblate is also being used on https://hosted.weblate.org/ as official translating service for phpMyAdmin, OsmAnd, Aptoide, FreedomBox, Weblate itself and many other projects. Should you be looking for hosting of translations for your project, I'm happy to host them for you or help with setting it up on your infrastructure. Further development of Weblate would not be possible without people providing donations, thanks to everybody who have helped so far! The roadmap for next release is just being prepared, you can influence this by expressing support for individual issues either by comments or by providing bounty for them.

Filed under: Debian English phpMyAdmin SUSE Weblate 0 comments

Weblate should be released by end of February, so it's now pretty much clear what will be there. So let's look at some of the upcoming features. There were many improvements in search related features. They got performance improvements (this is especially noticeable on site wide search). Additionally you can search for strings within translation project. On related topic, search and replace is now available for component or project wide operations, what can help you in case of massive renaming in your translations. We have worked on improving machine translations as well, this time we've added support for Yandex. In case you know some machine translation service which we do not yet support, please submit that to our issue tracker. Biggest improvement so far comes for visual context feature - it allows you to upload screenshots which are later shown to translators to give them better idea where and in which context the translation is used. So far you had to manually upload screenshot for every source string, what was far from being easy to use. With Weblate 2.12 (and this is already available on Hosted Weblate right now) the screenshots management got way better. There is now separate interface to manage screenshots (see screenshots for Weblate as an example), you can assign every screenshot to multiple source strings, however you can also let Weblate automatically recognize texts on the screenshots using OCR and suggest strings to assign. This can save you quite a lot of effort, especially with screenshots with lot of strings. This feature is still in early phase, so the suggestions are not always 100% matching, but we're working to improve it further. There will be some more features as well, you can look at our 2.12 milestone at GitHub to follow the process.

Filed under: Debian English SUSE Weblate 0 comments

Scratch is a block-based programming language created by the Lifelong Kindergarten Group (LLK) at the MIT Media Lab. Scratch gives kids the power to use programming to create their own interactive animations and computer games. Since 2007, the online community that allows Scratch programmers to share, remix, and socialize around their projects has drawn more than 16 million users who have shared nearly 20 million projects and more than 100 million comments. It is one of the most popular ways for kids to learn programming and among the larger online communities for kids in general.

Since 2010, I have published a series of papers using quantitative data collected from the database behind the Scratch online community. As the source of data for many of my first quantitative and data scientific papers, it s not a major exaggeration to say that I have built my academic career on the dataset. I was able to do this work because I happened to be doing my masters in a research group that shared a physical space ( The Cube ) with LLK and because I was friends with Andr s Monroy-Hern ndez, who started in my masters cohort at the Media Lab. A year or so after we met, Andr s conceived of the Scratch online community and created the first version for his masters thesis project. Because I was at MIT and because I knew the right people, I was able to get added to the IRB protocols and jump through the hoops necessary to get access to the database. Over the years, Andr s and I have heard over and over, in conversation and in reviews of our papers, that we were privileged to have access to such a rich dataset. More than three years ago, Andr s and I began trying to figure out how we might broaden this access. Andr s had the idea of taking advantage of the launch of Scratch 2.0 in 2013 to focus on trying to release the first five years of Scratch 1.x online community data (March 2007 through March 2012) most of the period that the codebase he had written ran the site. After more work than I have put into any single research paper or project, Andr s and I have published a data descriptor in Nature s new journal Scientific Data. This means that the data is now accessible to other researchers. The data includes five years of detailed longitudinal data organized in 32 tables with information drawn from more than 1 million Scratch users, nearly 2 million Scratch projects, more than 10 million comments, more than 30 million visits to Scratch projects, and much more. The dataset includes metadata on user behavior as well the full source code for every project. Alongside the data is the source code for all of the software that ran the website and that users used to create the projects as well as the code used to produce the dataset we ve released. Releasing the dataset was a complicated process. First, we had navigate important ethical concerns about the the impact that a release of any data might have on Scratch s users. Toward that end, we worked closely with the Scratch team and the the ethics board at MIT to design a protocol for the release that balanced these risks with the benefit of a release. The most important features of our approach in this regard is that the dataset we re releasing is limited to only public data. Although the data is public, we understand that computational access to data is different in important ways to access via a browser or API. As a result, we re requiring anybody interested in the data to tell us who they are and agree to a detailed usage agreement. The Scratch team will vet these applicants. Although we re worried that this creates a barrier to access, we think this approach strikes a reasonable balance. Beyond the the social and ethical issues, creating the dataset was an enormous task. Andr s and I spent Sunday afternoons over much of the last three years going column-by-column through the MySQL database that ran Scratch. We looked through the source code and the version control system to figure out how the data was created. We spent an enormous amount of time trying to figure out which columns and rows were public. Most of our work went into creating detailed codebooks and documentation that we hope makes the process of using this data much easier for others (the data descriptor is just a brief overview of what s available). Serializing some of the larger tables took days of computer time. In this process, we had a huge amount of help from many others including an enormous amount of time and support from Mitch Resnick, Natalie Rusk, Sayamindu Dasgupta, and Benjamin Berg at MIT as well as from many other on the Scratch Team. We also had an enormous amount of feedback from a group of a couple dozen researchers who tested the release as well as others who helped us work through through the technical, social, and ethical challenges. The National Science Foundation funded both my work on the project and the creation of Scratch itself. Because access to data has been limited, there has been less research on Scratch than the importance of the system warrants. We hope our work will change this. We can imagine studies using the dataset by scholars in communication, computer science, education, sociology, network science, and beyond. We re hoping that by opening up this dataset to others, scholars with different interests, different questions, and in different fields can benefit in the way that Andr s and I have. I suspect that there are other careers waiting to be made with this dataset and I m excited by the prospect of watching those careers develop. You can find out more about the dataset, and how to apply for access, by reading the data descriptor on Nature s website.

Exactly on the schedule, Weblate 2.11 is out today. This release brings extended stats available to users and various other improvements and bug fixes. Full list of changes:

Include language detailed information on language page.
Mercurial backend improvements.
Added option to specify translation component priority.
More consistent usage of Group ACL even with less used permissions.
Added WL_BRANCH variable to hook scripts.
Improved developer documentation.
Better compatibility with various Git versions in Git exporter addon.
Included per project and component stats.
Added language code mapping for better support of Microsoft Translate API.
Moved fulltext cleanup to background job to make translation removal faster.
Fixed displaying of plural source for languages with single plural form.
Improved error handling in import_project.
Various performance improvements.

Filed under: Debian English phpMyAdmin SUSE Weblate 0 comments

As children use digital media to learn and socialize, others are collecting and analyzing data about these activities. In school and at play, these children find that they are the subjects of data science. As believers in the power of data analysis, we believe that this approach falls short of data science s potential to promote innovation, learning, and power. Motivated by this fact, we have been working over the last three years as part of a team at the MIT Media Lab and the University of Washington to design and build a system that attempts to support an alternative vision: children as data scientists. The system we have built is described in a new paper Scratch Community Blocks: Supporting Children as Data Scientists that will be published in the proceedings of CHI 2017. Our system is built on top of Scratch, a visual, block-based programming language designed for children and youth. Scratch is also an online community with over 15 million registered members who share their Scratch projects, remix each others work, have conversations, provide feedback, bookmark or love projects they like, follow other users, and more. Over the last decade, researchers including us have used the Scratch online community s database to study the youth using Scratch. With Scratch Community Blocks, we attempt to put the power to programmatically analyze these data into the hands of the users themselves. To do so, our new system adds a set of new programming primitives (blocks) to Scratch so that users can access public data from the Scratch website from inside Scratch. Blocks in the new system gives users access to project and user metadata, information about social interaction, and data about what types of code are used in projects. The full palette of blocks to access different categories of data is shown below.

The new blocks allow users to programmatically access, filter, and analyze data about their own participation in the community. For example, with the simple script below, we can find whether we have followers in Scratch who report themselves to be from Spain, and what their usernames are.

In designing the system, we had two primary motivations. First, we wanted to support avenues through which children can engage in curiosity-driven, creative explorations of public Scratch data. Second, we wanted to foster self-reflection with data. As children looked back upon their own participation and coding activity in Scratch through the project they and their peers made, we wanted them to reflect on their own behavior and learning in ways that shaped their future behavior and promoted exploration. After designing and building the system over 2014 and 2015, we invited a group of active Scratch users to beta test the system in early 2016. Over four months, 700 users created more than 1,600 projects. The diversity and depth of users creativity with the new blocks surprised us. Children created projects that gave the viewer of the project a personalized doughnut-chart visualization of their coding vocabulary on Scratch, rendered the viewer s number of followers as scoops of ice-cream on a cone, attempted to find whether love-its for projects are more common on Scratch than favorites , and told users how talkative they were by counting the cumulative string-length of project titles and descriptions. We found that children, rather than making canonical visualizations such as pie-charts or bar-graphs, frequently made information representations that spoke to their own identities and aesthetic sensibilities. A 13-year-old girl had made a virtual doll dress-up game where the player s ability to buy virtual clothes and accessories for the doll was determined by the level of their activity in the Scratch community. When we asked about her motivation for making such a project, she said:

I was trying to think of something that somebody hadn t done yet, and I didn t see that. And also I really like to do art on Scratch and that was a good opportunity to use that and mix the two [art and data] together.

We also found at least some evidence that the system supported self-reflection with data. For example, after seeing a project that showed its viewers a visualization of their past coding vocabulary, a 15-year-old realized that he does not do much programming with the pen-related primitives in Scratch, and wrote in a comment, epic! looks like we need to use more pen blocks. :D.

Additionally, we noted that that as children made and interacted with projects made with Scratch Community Blocks, they started to critically think about the implications of data collection and analysis. These conversations are the subject of another paper (also being published in CHI 2017). In a 1971 article called Teaching Children to be Mathematicians vs. Teaching About Mathematics , Seymour Papert argued for the need for children doing mathematics vs. learning about it. He showed how Logo, the programming language he was developing at that time with his colleagues, could offer children a space to use and engage with mathematical ideas in creative and personally motivated ways. This, he argued, enabled children to go beyond knowing about mathematics to doing mathematics, as a mathematician would. Scratch Community Blocks has not yet been launched for all Scratch users and has several important limitations we discuss in the paper. That said, we feel that the projects created by children in our the beta test demonstrate the real potential for children to do data science, and not just know about it, provide data for it, and to have their behavior nudged and shaped by it.

This blog post and the paper it describes are collaborative work with Sayamindu Dasgupta. We have also received support and feedback from members of the Scratch team at MIT (especially Mitch Resnick and Natalie Rusk), as well as from Hal Abelson. Financial support came from the US National Science Foundation. The paper itself is open access so anyone can read the entire paper here. This blog post was also posted on Sayamindu Dasgupta s blog, on the Community Data Science Collective blog, and in several other places.

Today, Weblate got new home. The difference is not that big - it has been moved from my personal GitHub account to WeblateOrg organization. The main motivation is to have all Weblate related repositories in one location (all others including wlc, Docker or website are already there). The move will also allow to better manage the project in future as having it in separate repositories provides less management options on GitHub than using organization. In case you have cloned the git repository, please update

git remote set-url origin https://github.com/WeblateOrg/weblate.git

Of course all issue tracker locations have changed as well (I believe the redirect on GitHub will stay as long as I won't fork the repository, so expect it to work at least month). See GitHub documentation on repository moving. I'm sorry for all the troubles, but I think this is really necessary move.

Filed under: Debian English SUSE Weblate 0 comments

As web search engines and IRC seems to be of no help, maybe someone here has a helpful idea. I have some service written in python that comes with a .service file for systemd. I now want to build&install a working service file from the software's setup.py. I can override the build/build_py commands of setuptools, however that way I still lack knowledge wrt. the bindir/prefix where my service script will be installed. Solution Turns out, if you override the install command (not the install_data!), you will have self.root and self.install_scripts (and lots of other self.install_*). As a result, you can read the template and write the desired output file after calling super's run method. The fix was inspired by GateOne (which, however doesn't get the --root parameter right, you need to strip self.root from the beginning of the path to actually make that work as intended). As suggested on IRC, the snippet (and my software) no use pkg-config to get at the systemd path as well. This is a nice improvement orthogonal to the original problem. The implementation here follows bley.


def systemd_unit_path():
    try:
        command = ["pkg-config", "--variable=systemdsystemunitdir", "systemd"]
        path = subprocess.check_output(command, stderr=subprocess.STDOUT)
        return path.decode().replace('\n', '')
    except (subprocess.CalledProcessError, OSError):
        return "/lib/systemd/system"
class my_install(install):
    _servicefiles = [
        'foo/bar.service',
        ]
    def run(self):
        install.run(self)
        if not self.dry_run:
            bindir = self.install_scripts
            if bindir.startswith(self.root):
                bindir = bindir[len(self.root):]
            systemddir = "%s%s" % (self.root, systemd_unit_path())
            for servicefile in self._servicefiles:
                service = os.path.split(servicefile)[1]
                self.announce("Creating %s" % os.path.join(systemddir, service),
                              level=2)
                with open(servicefile) as servicefd:
                    servicedata = servicefd.read()
                with open(os.path.join(systemddir, service), "w") as servicefd:
                    servicefd.write(servicedata.replace("%BINDIR%", bindir))

Comments, suggestions and improvements, of course, welcome!

Background: I ve been relearning Rust (more about that in a separate post, some time later), and in doing so, I chose to implement the low-level parts of git (I ll touch the why in that separate post I just promised). Disclaimer: It s friday. This is not entirely(?) a serious post. So, I was looking at Documentation/technical/index-format.txt, and saw:

32-bit number of index entries.

What? The index/staging area can t handle more than ~4.3 billion files? There I was, writing Rust code to write out the index.

try!(out.write_u32::<NetworkOrder>(self.entries.len()));

(For people familiar with the byteorder crate and wondering what NetworkOrder is, I have a use byteorder::BigEndian as NetworkOrder) And the Rust compiler rightfully barfed:

error: mismatched types:
 expected  u32 ,
    found  usize  [E0308]

And there I was, wondering: mmmm should I just add as u32 and silently truncate or hey what does git do? And it turns out, git uses an unsigned int to track the number of entries in the first place, so there is no truncation happening. Then I thought but what happens when cache_nr reaches the max? Well, it turns out there s only one obvious place where the field is incremented. What? Holy coffin nails, Batman! No overflow check? Wait a second, look 3 lines above that:

ALLOC_GROW(istate->cache, istate->cache_nr + 1, istate->cache_alloc);

Yeah, obviously, if you re incrementing cache_nr, you already have that many entries in memory. So, how big would that array be?

        struct cache_entry **cache;

So it s an array of pointers, assuming 64-bits pointers, that s ~34.3 GB. But, all those cache_nr entries are in memory too. How big is a cache entry?

struct cache_entry  
        struct hashmap_entry ent;
        struct stat_data ce_stat_data;
        unsigned int ce_mode;
        unsigned int ce_flags;
        unsigned int ce_namelen;
        unsigned int index;     /* for link extension */
        unsigned char sha1[20];
        char name[FLEX_ARRAY]; /* more */
 ;

So, 4 ints, 20 bytes, and as many bytes as necessary to hold a path. And two inline structs. How big are they?


struct hashmap_entry  
        struct hashmap_entry *next;
        unsigned int hash;
 ;
struct stat_data  
        struct cache_time sd_ctime;
        struct cache_time sd_mtime;
        unsigned int sd_dev;
        unsigned int sd_ino;
        unsigned int sd_uid;
        unsigned int sd_gid;
        unsigned int sd_size;
 ;

Woohoo, nested structs.

struct cache_time  
        uint32_t sec;
        uint32_t nsec;
 ;

So all in all, we re looking at 1 + 2 + 2 + 5 + 4 32-bit integers, 1 64-bits pointer, 2 32-bits padding, 20 bytes of sha1, for a total of 92 bytes, not counting the variable size for file paths. The average path length in mozilla-central, which only has slightly over 140 thousands of them, is 59 (including the terminal NUL character). Let s conservatively assume our crazy repository would have the same average, making the average cache entry 151 bytes. But memory allocators usually allocate more than requested. In this particular case, with the default allocator on GNU/Linux, it s 156 (weirdly enough, it s 152 on my machine). 156 times 4.3 billion 670 GB. Plus the 34.3 from the array of pointers: 704.3 GB. Of RAM. Not counting the memory allocator overhead of handling that. Or all the other things git might have in memory as well (which apparently involves a hashmap, too, but I won t look at that, I promise). I think one would have run out of memory before hitting that integer overflow. Interestingly, looking at Documentation/technical/index-format.txt again, the on-disk format appears smaller, with 62 bytes per file instead of 92, so the corresponding index file would be smaller. (And in version 4, paths are prefix-compressed, so paths would be smaller too). But having an index that large supposes those files are checked out. So let s say I have an empty ext4 file system as large as possible (which I m told is 2^60 bytes (1.15 billion gigabytes)). Creating a small empty ext4 tells me at least 10 inodes are allocated by default. I seem to remember there s at least one reserved for the journal, there s the top-level directory, and there s lost+found ; there apparently are more. Obviously, on that very large file system, We d have a git repository. git init with an empty template creates 9 files and directories, so that s 19 more inodes taken. But git init doesn t create an index, and doesn t have any objects. We d thus have at least one file for our hundreds of gigabyte index, and at least 2 who-knows-how-big files for the objects (a pack and its index). How many inodes does that leave us with? The Linux kernel source tells us the number of inodes in an ext4 file system is stored in a 32-bits integer. So all in all, if we had an empty very large file system, we d only be able to store, at best, 2^32 22 files And we wouldn t even be able to get cache_nr to overflow. while following the rules. Because the index can keep files that have been removed, it is actually possible to fill the index without filling the file system. After hours (days? months? years? decades?*) of running

seq 0 4294967296   while read i; do touch $i; git update-index --add $i; rm $i; done

One should be able to reach the integer overflow. But that d still require hundreds of gigabytes of disk space and even more RAM.

At the rate it was possible to add files to the index when I tried (yeah, I tried), for a few minutes, and assuming a constant rate, the estimate is close to 2 years. But the time spent reading and writing the index file increases linearly with its size, so the longer it d run, the longer it d take.

Ok, it s actually much faster to do it hundreds of thousand files at a time, with something like:

seq 0 100000 4294967296   while read i; do j=$(seq $i $(($i + 99999))); touch $j; git update-index --add $j; rm $j; done

At the rate the first million files were added, still assuming a constant rate, it would take about a month on my machine. Considering reading/writing a list of a million files is a thousand times faster than reading a list of a billion files, assuming linear increase, we re still talking about decades, and plentiful RAM. Fun fact: after leaving it run for 5 times as much as it had run for the first million files, it hasn t even done half more One could generate the necessary hundreds-of-gigabytes index manually, that wouldn t be too hard, and assuming it could be done at about 1 GB/s on a good machine with a good SSD, we d be able to craft a close-to-explosion index within a few minutes. But we d still lack the RAM to load it. So, here is the open question: should I report that integer overflow? Wow, that was some serious procrastination. Edit: Epilogue: Actually, oops, there is a separate integer overflow on the reading side that can trigger a buffer overflow, that doesn t actually require a large index, just a crafted header, demonstrating that yes, not all integer overflows are equal.

wlc 0.4, a command line utility for Weblate, has been just released. This release doesn't bring much changes, but still worth announcing. The most important change is that development repository has been moved under WeblateOrg organization at GitHub, you can now find it at https://github.com/WeblateOrg/wlc. Another important news is that Debian package is currently waiting in NEW queue and will hopefully soon hit unstable. wlc is built on API introduced in Weblate 2.6 and still being in development. Several commands from wlc will not work properly if executed against Weblate 2.6, first fully supported version will be 2.7 (current git is okay as well, it is now running on both demo and hosting servers). You can usage examples in the wlc documentation.

Filed under: Debian English SUSE Weblate 0 comments

With more than 10 million users, the Scratch online community is the largest online community where kids learn to program. Since it was created, a central goal of the community has been to promote remixing the reworking and recombination of existing creative artifacts. As the video above shows, remixing programming projects in the current web-based version of Scratch is as easy is as clicking on the see inside button in a project web-page, and then clicking on the remix button in the web-based code editor. Today, close to 30% of projects on Scratch are remixes. Remixing plays such a central role in Scratch because its designers believed that remixing can play an important role in learning. After all, Scratch was designed first and foremost as a learning community with its roots in the Constructionist framework developed at MIT by Seymour Papert and his colleagues. The design of the Scratch online community was inspired by Papert s vision of a learning community similar to Brazilian Samba schools (Henry Jenkins writes about his experience of Samba schools in the context of Papert s vision here), and a comment Marvin Minsky made in 1984:

Adults worry a lot these days. Especially, they worry about how to make other people learn more about computers. They want to make us all computer-literate. Literacy means both reading and writing, but most books and courses about computers only tell you about writing programs. Worse, they only tell about commands and instructions and programming-language grammar rules. They hardly ever give examples. But real languages are more than words and grammar rules. There s also literature what people use the language for. No one ever learns a language from being told its grammar rules. We always start with stories about things that interest us.

In a new paper titled Remixing as a pathway to Computational Thinking that was recently published at the ACM Conference on Computer Supported Collaborative Work and Social Computing (CSCW) conference, we used a series of quantitative measures of online behavior to try to uncover evidence that might support the theory that remixing in Scratch is positively associated with learning.

Of course, because Scratch is an informal environment with no set path for users, no lesson plan, and no quizzes, measuring learning is an open problem. In our study, we built on two different approaches to measure learning in Scratch. The first approach considers the number of distinct types of programming blocks available in Scratch that a user has used over her lifetime in Scratch (there are 120 in total) something that can be thought of as a block repertoire or vocabulary. This measure has been used to model informal learning in Scratch in an earlier study. Using this approach, we hypothesized that users who remix more will have a faster rate of growth for their code vocabulary. Controlling for a number of factors (e.g. age of user, the general level of activity) we found evidence of a small, but positive relationship between the number of remixes a user has shared and her block vocabulary as measured by the unique blocks she used in her non-remix projects. Intriguingly, we also found a strong association between the number of downloads by a user and her vocabulary growth. One interpretation is that this learning might also be associated with less active forms of appropriation, like the process of reading source code described by Minksy. The second approach we used considered specific concepts in programming, such as loops, or event-handling. To measure this, we utilized a mapping of Scratch blocks to key programming concepts found in this paper by Karen Brennan and Mitchel Resnick. For example, in the image below are all the Scratch blocks mapped to the concept of loop .

We looked at six concepts in total (conditionals, data, events, loops, operators, and parallelism). In each case, we hypothesized that if someone has had never used a given concept before, they would be more likely to use that concept after encountering it while remixing an existing project. Using this second approach, we found that users who had never used a concept were more likely to do so if they had been exposed to the concept through remixing. Although some concepts were more widely used than others, we found a positive relationship between concept use and exposure through remixing for each of the six concepts. We found that this relationship was true even if we ignored obvious examples of cutting and pasting of blocks of code. In all of these models, we found what we believe is evidence of learning through remixing. Of course, there are many limitations in this work. What we found are all positive correlations we do not know if these relationships are causal. Moreover, our measures do not really tell us whether someone has understood the usage of a given block or programming concept.However, even with these limitations, we are excited by the results of our work, and we plan to build on what we have. Our next steps include developing and utilizing better measures of learning, as well as looking at other methods of appropriation like viewing the source code of a project.

This blog post and the paper it describes are collaborative work with Sayamindu Dasgupta, Andr s Monroy-Hern ndez, and William Hale. The paper is released as open access so anyone can read the entire paper here. This blog post was also posted on Sayamindu Dasgupta s blog and on Medium by the MIT Media Lab.

What happened in the Reproducible Builds effort between May 8th and May 14th 2016: Documentation updates

Ximin Luo spent some work on improving and updating our documentation:
- updated the SOURCE_DATE_EPOCH adoption page at wiki.debian.org/ReproducibleBuilds/TimestampsProposal with a compilation of our reasonings versus common arguments encountered with package maintainers and upstream developers.
- updated wiki.debian.org/ReproducibleBuilds/Howto
- and also moved resolved issues to wiki.debian.org/ReproducibleBuilds/OldIssues.
https://anonscm.debian.org/git/reproducible/ has been cleaned up by Ximin, including the removal of the link from diffoscope.git to debbindiff.git.

Toolchain fixes

dpkg 1.18.7 has been uploaded to unstable, after which Mattia Rizzolo took care of rebasing our patched version.
- this prompted h01ger to restart the discussion about reproducible dpkg for sid and stretch
- to which Guillem replied with a status update.
gcc-5 and gcc-6 migrated to testing with the patch to honour SOURCE_DATE_EPOCH
Ximin Luo started an upstream discussion with the Ghostscript developers.
Norbert Preining has uploaded a new version of texlive-bin with these changes relevant to us:
- imported Upstream version 2016.20160512.41045 support for suppressing timestamps (SOURCE_DATE_EPOCH) (Closes: #792202)
- add support for SOURCE_DATE_EPOCH also to luatex
cdbs 0.4.131 has been uploaded to unstable by Jonas Smedegaard, fixing these issues relevant to us:
- #794241: export SOURCE_DATE_EPOCH. Original patch by akira
- #764478: call dh_strip_nondeterminism if available. Original patch by Holger Levsen
libxslt 1.1.28-3 has been uploaded to unstable by Mattia Rizzolo, fixing the following toolchain issues:
- #823857: backport patch from upstream to provide stable IDs in the genrated documents.
- #791815: Honour SOURCE_DATE_EPOCH when embedding timestamps in docs. Patch by Eduard Sanou.

Packages fixed The following 28 packages have become newly reproducible due to changes in their build dependencies: actor-framework ask asterisk-prompt-fr-armelle asterisk-prompt-fr-proformatique coccinelle cwebx d-itg device-tree-compiler flann fortunes-es idlastro jabref konclude latexdiff libint minlog modplugtools mummer mwrap mxallowd mysql-mmm ocaml-atd ocamlviz postbooks pycorrfit pyscanfcs python-pcs weka The following 9 packages had older versions which were reproducible, and their latest versions are now reproducible again due to changes in their build dependencies: csync2 dune-common dune-localfunctions libcommons-jxpath-java libcommons-logging-java libstax-java libyanfs-java python-daemon yacas The following packages have become newly reproducible after being fixed:

asymptote/2.38-1 by Norbert Preining, original patch by Alexis Bienven e.
bnd/2.4.1-4 by Emmanuel Bourg.
cgal/4.8-4 by Joachim Reichel.
courier/0.76.1-2 by Ond ej Sur , original patch by Alexis Bienven e.
dict-foldoc/20160505-1 by Iustin Pop, original patch by Reiner Herrmann.
emoslib/2:4.4.1-2 by Alastair McKinstry, original patch by boyska.
libksba/1.3.4-3 by Andreas Metzler.
mp4h/1.3.1-15 by Axel Beckert.
stk/4.5.2+dfsg-2 by Felipe Sateler, original patch by Alexis Bienven e.
sympow/1.023-7 by Jerome Benoit.
x11proto-input/2.3.2-1 by Julien Cristau, original patch by Eduard Sanou.

The following packages had older versions which were reproducible, and their latest versions are now reproducible again after being fixed:

klibc/2.0.4-9 by Ben Hutchings.

Some uploads have fixed some reproducibility issues, but not all of them:

allegro4.4/2:4.4.2-8 by Andreas R nnquist, original patch by Daniel Shahaf.
gearmand/1.0.6-7 by Alexandre Mestiashvili, original patch by Chris Lamb.
mame/0.172-1 by Jordi Mallach.
python-babel/1.3+dfsg.1-7 by Jean-Michel Vourg re, original patch by Val Lorentz.
openttd/1.6.0-1 by Matthijs Kooijman.
blender/2.77.a+dfsg0-4 by Matteo F. Vescovi, original patch by Campbell Burton.
unburden-home-dir/0.4 by Axel Beckert.
refpolicy/2:2.20140421-10 by Laurent Bigonville, original patch by Chris Lamb.
kdiff3/0.9.98-3 by Eike Sauer.

Patches submitted that have not made their way to the archive yet:

#787424 against emacs24 by Alexis Bienven e: order hashes when generating .el files
#823764 against sen by Daniel Shahaf: render the build timestamp in a consistent timezone
#823797 against openclonk by Alexis Bienven e: honour SOURCE_DATE_EPOCH
#823961 against herbstluftwm by Fabian Wolff: honour SOURCE_DATE_EPOCH
#824049 against emacs24 by Alexis Bienven e: make start value of gensym-counter reproducible
#824050 against emacs24 by Alexis Bienven e: make autoloads files reproducible
#824182 against codeblocks by Fabian Wolff: honour SOURCE_DATE_EPOCH
#824263 against cmake by Reiner Herrmann: sort file lists from file(GLOB ...)

Package reviews 344 reviews have been added, 125 have been updated and 20 have been removed in this week. 14 FTBFS bugs have been reported by Chris Lamb. tests.reproducible-builds.org

bpi0 has been reinstalled with a new ssd. (vagrant/h01ger)
A new navigation menu for the Debian pages at https://tests.reproducible-builds.org/ has been implemented. (h01ger)

Misc. Dan Kegel sent a mail to report about his experiments with a reproducible dpkg PPA for Ubuntu. According to him sudo add-apt-repository ppa:dank/dpkg && sudo apt-get update && sudo apt-get install dpkg should be enough to get reproducible builds on Ubuntu 16.04. This week's edition was written by Ximin Luo and Holger Levsen and reviewed by a bunch of Reproducible builds folks on IRC.

GnuPG Crowdfunding and sticker

I've just received my laptop sticker from the GnuPG crowdfund http://goteo.org/project/gnupg-new-website-and-infrastructure: it is of excellent quality, but comes with HOWTO-like detailed instructions to apply it in the proper way.

This strikes me as oddly appropriate.

#gnupg

What happened in the Reproducible Builds effort between April 24th and 30th 2016. Media coverage Reproducible builds were mentioned explicitly in two talks at the Mini-DebConf in Vienna:

Martin Michlmayr had a talk in which he presented an overview about innovations and changes in Debian in the last years. Martin expressed his disappointment that there was no talk from us in Vienna (we'll fix this at DebConf16 in Cape Town) and described the reproducible builds work as "a real innovation". His talk is very much worth seeing, whatever your current perspective, it might change your view on Debian.
Ben Hutchings explains how Secure Boot will use signed kernels via separate signature packages and how this was designed with reproducible builds in mind.

Aspiration together with the OTF CommunityLab released their report about the Reproducible Builds summit in December 2015 in Athens. Toolchain fixes Now that the GCC development window has been opened again, the SOURCE_DATE_EPOCH patch by Dhole and Matthias Klose to address the issue timestamps_from_cpp_macros (__DATE__ / __TIME__) has been applied upstream and will be released with GCC 7. Following that Matthias Klose also has uploaded gcc-5/5.3.1-17 and gcc-6/6.1.1-1 to unstable with a backport of that SOURCE_DATE_EPOCH patch. Emmanuel Bourg uploaded maven/3.3.9-4, which uses SOURCE_DATE_EPOCH for the maven.build.timestamp. (SOURCE_DATE_EPOCH specification) Other upstream changes Alexis Bienven e submitted a patch to Sphinx which extends SOURCE_DATE_EPOCH support for copyright years in generated documentation. Packages fixed The following 12 packages have become reproducible due to changes in their build dependencies: hhvm jcsp libfann libflexdock-java libjcommon-java libswingx1-java mobile-atlas-creator not-yet-commons-ssl plexus-utils squareness svnclientadapter The following packages have became reproducible after being fixed:

anope/2.0.3-2 by Dominic Hargreaves, original patch by Alexis Bienven e.
apparmor-profiles-extra/1.7 by intrigeri, patch by Felix Geyer.
basket/2.10~beta+git20160425.b77687f-1 by Luigi Toscano, original patch by Alexis Bienven e.
bind-dyndb-ldap/8.0-2 by Timo Aaltonen.
bliss/0.73-1 by Jerome Benoit.
gap-alnuth/3.0.0-3 by Bill Allombert.
gap-radiroot/2.7-2 by Bill Allombert.
genometools/1.5.8+ds-3 by Sascha Steinbiss.
gprbuild/2015-3 by Nicolas Boulenguez.
gtkhash/0.7.0-3 by M nica Ram rez Arceda.
josm/0.0.svn10161+dfsg-1 by Bas Couwenberg.
libjgoodies-animation-java/1.4.3-1 by Markus Koschany.
ltrsift/1.0.2-7 by Sascha Steinbiss.
mutt/1.6.0-1 by Matteo F. Vescovi, original patch by Daniel Shahaf.
netsniff-ng/0.6.1-1 by Kartik Mistry, original patch by Reiner Herrmann.
netrw/1.3.2-3 by Giovanni Mascellani.
openthesaurus/20160424-3 by Rene Engelhard, original patch by Dhole.
php-pear/1:1.10.1+submodules+notgz-8 by Mathieu Parent.
samba/2:4.4.2+dfsg-2 by Jelmer Vernoo .
swift/2.7.0-3 by Ond ej Nov .
tp-smapi/0.42-1 by Evgeni Golov, original patch by Chris Lamb.
vifm/0.8.1a-0.1 by Ond ej Nov .
xfonts-mplus/1:2.2.4-2 by Hideki Yamane, original patch by Chris Lamb.
xpa/2.1.17-2 by Ole Streicher, original patch by Alexis Bienven e.

Some uploads have fixed some reproducibility issues, but not all of them:

camitk/4.0.0~beta-1 by Emmanuel Promayon, original patch by Maria Valentina Marin.
elastix/4.8-8 by Gert Wollny.
freefem++/3.46+dfsg1-1 by Dimitrios Eftaxiopoulos, original patch by Alexis Bienven e.
grib-api/1.15.0-1 by Alastair McKinstry, original patch by Santiago Vila.
isorelax/20041111-9 by Emmanuel Bourg.
libical/2.0.0-0.1 by Andreas Henriksson, original patch by Chris Lamb.
sawfish/1:1.11.90-1 by Jose M Calhariz, original patch by Alexis Bienven e.
singular/4.0.3-p1+ds-1 by Jerome Benoit.
syslinux/3:6.03+dfsg-12 by Mattia Rizzolo, original patch by Reiner Herrmann.
transdecoder/2.0.1+dfsg-3 by Michael R. Crusoe, original patch by Dhole.

Patches submitted that have not made their way to the archive yet:

#822566 against stk by Alexis Bienven e: sort lists of object files for reproducible linking order.
#822948 against shotwell by Alexis Bienven e: normalize tarball permissions and use locale/timezone-independent modification time.
#822963 against htop by Alexis Bienven e: use SOURCE_DATE_EPOCH for embedded copyright year, which has before already been applied in git and upstream.

Package reviews 95 reviews have been added, 15 have been updated and 129 have been removed in this week. 22 FTBFS bugs have been reported by Chris Lamb and Martin Michlmayr. diffoscope development

diffoscope 52~bpo8+1 has been uploaded to jessie-backports by Mattia Rizzolo, where it is currently waiting for NEW-approval.
Support for the deb(5) format (uncompressed data.tar/control.tar, control.tar.xz) (Closes: #818414) has been completed by Reiner Herrmann in git.

strip-nondeterminism development

Support for EPUB documents has been added (to the development version in git) by Holger Levsen, to address the timestamps_in_epub issue.

tests.reproducible-builds.org

To monitor the uptodateness of diffoscope everywhere, tests checking this on FreeBSD, NetBSD & MacPorts were added. Tests on other distributions will be added once the relevant bugs in whohas are fixed in jessie. (h01ger)

Misc. Amongst the 29 interns who will work on Debian through GSoC and Outreachy there are four who will be contributing to Reproducible Builds for Debian and Free Software. We are very glad to welcome ceridwen, Satyam Zode, Scarlett Clark and Valerie Young and look forward to working together with them the coming months (and maybe beyond)! This week's edition was written by Reiner Herrmann and Holger Levsen and reviewed by a bunch of Reproducible builds folks on IRC.

More than 10 years ago, I launched Unhappy Birthday in a fit of copyrighteous exuberance. In the last decade, I have been interviewed on the CBC show WireTap and have received an unrelenting stream of hate mail from random strangers. With a recently announced settlement suggesting that Happy Birthday is on its way into the public domain, it s not possible for even the highest-protectionist in me to justify the continuation of the campaign in its original form. As a result, I ve suspended the campaign while I plan my next move. Here s the full text of the notice I posted on the Unhappy Birthday website:

Unfortunately, a series of recent legal rulings have forced us to suspend our campaign. In 2015, Time Warner s copyright claim to Happy Birthday was declared invalid. In 2016, a settlement was announced that calls for a judge to officially declare that the song is in the public domain. This is horrible news for the future of music. It is horrible news for anybody who cares that creators, their heirs, etc., are fairly remunerated when their work is performed. What incentive will there be for anybody to pen the next Happy Birthday knowing that less than a century after their deaths their estates and the large multinational companies that buy their estates might not be able to reap the financial rewards from their hard work and creativity? We are currently planning a campaign to push for a retroactive extension of copyright law to place Happy Birthday, and other works, back into the private domain where they belong! We believe this is a winnable fight. After all, copyright has been retroactively extended before! Stay tuned! In the meantime, we ll keep this page here for historical purposes.
Copyrighteous Benjamin Mako Hill (2016-02-11)

My office door is on the second floor in front the major staircase in my building. I work with my door open so that my colleagues and my students know when I m in. The only time I consider deviating from this policy is the first week of the quarter when I m faced with a stream of students, usually lost on their way to class and that, embarrassingly, I am usually unable to help. I made this poster so that these conversations can, in a way, continue even when I am not in the office.

A lot has been written about the Internet in Cuba over the years. I have read a few articles, from New York Times' happy support for Google's invasion of Cuba to RSF's dramatic and fairly outdated report about censorship in Cuba. Having written before about Internet censorship in Tunisia, I was curious to see if I could get a feel of what it is like over there, now that a new Castro is in power and the Obama administration has started restoring diplomatic ties with Cuba. With those political changes coming signifying the end of an embargo that has been called genocidal by the Cuban government, it is surprisingly difficult to get fresh information about the current state of affairs. This article aims to fill that gap in clarifying how the internet works in Cuba, what kind of censorship mechanisms are in place and how to work around them. It also digs more technically into the network architecture and performance. It is published in the hope of providing both Cubans and the rest of the world with a better understanding of their network and, if possible, Cubans ways to access the internet more cheaply or without censorship.

"Censorship" and workarounds Unfortunately, I have been connected to the internet only through the the Varadero airport and the WiFi of a "full included" resort near Jibacoa. I have to come to assume that this network is likely to be on a segregated, uncensored internet while the rest of the country suffers the wrath of the Internet censorship in Cuba I have seen documented elsewhere. Through my research, I couldn't find any sort of direct censorship. The Netalyzr tool couldn't find anything significantly wrong with the connection, other than the obvious performance problems related both to the overloaded uplinks of the Cuban internet. I ran an incomplete OONI probe as well, and it seems there was no obvious censorship detected there as well, at least according to folks in the helpful `#ooni` IRC channel. Tor also works fine, and could be a great way to avoid the global surveillance system described later in this article. Nevertheless, it still remains to be seen how the internet is censored in the "real" Cuban internet, outside of the tourist designated areas - hopefully future visitors or locals can expand on this using the tools mentioned above, using the regular internet. Usual care should be taken when using any workaround tools, mentioned in this post or not, as different regimes over the world have accused, detained, tortured and killed sometimes for the mere fact of using or distributing circumvention tools. For example, a Russian developer was arrested and detained in 2001 by United States' FBI for exposing vulnerabilities in the Adobe e-books copy protection mechanisms. Similarly, people distributing Tor and other tools have been arrested during the period prior to the revolution in Tunisia.

The Cuban captive portal There is, however, a more pernicious and yet very obvious censorship mechanism at work in Cuba: to get access to the internet, you have to go through what seems to be a state-wide captive portal, which I have seen both at the hotel and the airport. It is presumably deployed at all levels of the internet access points. To get credentials through that portal, you need a username and password which you get by buying a Nauta card. Those cards cost 2$CUC and get you an hour of basically unlimited internet access. That may not seem like a lot for a rich northern hotel party-goer, but for Cubans, it's a lot of money, given that the average monthly salary is around 20$CUC. The system is also pretty annoying to use, because it means you do not get continuous network access: every hour, you need to input a new card, which will obviously make streaming movies and other online activities annoying. It also makes hosting servers basically impossible. So while Cuba does not have, like China or Iran, a "great firewall", there is definitely a big restriction to going online in Cuba. Indeed, it seems to be how the government ensures that Cubans do not foment too much dissent online: keep the internet slow and inaccessible, and you won't get too many Arab spring / blogger revolutions.

Bypassing the Cuban captive portal The good news is that it is perfectly possible for Cubans (or at least for a tourist like me with resources outside of the country) to bypass the captive portal. Like many poorly implemented portals, the portal allows DNS traffic to go through, which makes it possible to access the global network for free by using a tool like iodine which tunnels IP traffic over DNS requests. Of course, the bandwidth and reliability of the connection you get through such a portal is pretty bad. I have regularly seen 80% packet loss and over two minutes of latency:

--- 10.0.0.1 ping statistics ---
163 packets transmitted, 31 received, 80% packet loss, time 162391ms
rtt min/avg/max/mdev = 133.700/2669.535/64188.027/11257.336 ms, pipe 65

Still, it allowed me to login to my home server through SSH using Mosh to workaround the reliability issues. Every once in a while, mosh would get stuck and keep on trying to send packets to probe the server, which would clog the connection even more. So I regularly had to restart the whole stack using these commands:

killall iodine # stop DNS tunnel
nmcli n off # turn off wifi to change MAC address
macchanger -A wlan0 # change MAC address
nmcli n on # turn wifi back on
sleep 3 # wait for wifi to settle
iodine-client-start # restart DNS tunnel

The Koumbit Wiki has good instructions on how to setup a DNS tunnel. I am wondering if such a public service could be of use for Cubans, although I am not sure how it could be deployed only for Cubans, and what kind of traffic it could support... The fact is that iodine does require a server to operate, and that server must be run on the outside of the censored perimeter, something that Cubans may not be able to afford in the first place. Another possible way to save money with the captive portal would be to write something that automates connecting and disconnecting from the portal. You would feed that program a list of credentials and it would connect to the portal only on demand, and disconnect as soon as no traffic goes through. There are details on the implementation of the captive portal below that may help future endeavours in that field.

Private information revealed to the captive portal It should be mentioned, however, that the captive portal has a significant amount of information on clients, which is a direct threat to the online privacy of Cuban internet users. Of course the unique identifiers issued with the Nauta cards can be correlated with your identity, right from the start. For example, I had to give my room number to get a Nauta card issued. Then the central portal also knows which access point you are connected to. For example, the central portal I was connected to `Wifi_Memories_Jibacoa` which, for anyone that cares to research, will give them a location of about 20 square meters where I was located when connected (there is only one access point in the whole hotel). Finally, the central portal also knows my MAC address, a unique identifier for the computer I am using which also reveals which brand of computer I am using (Mac, Lenovo, etc). While this address can be changed, very few people know that, let alone how. This led me to question whether I would be allowed back in Cuba (or even allowed out!) after publishing this blog post, as it is obvious that I can be easily identified based on the time this article was published, my name and other details. Hopefully the Cuban government will either not notice or not care, but this can be a tricky situation, obviously. I have heard that Cuban prisons are not the best hangout place in Cuba, to say the least...

Network configuration assessment This section is more technical and delves more deeply in the Cuban internet to analyze the quality and topology of the network, along with hints as to which hardware and providers are being used to support the Cuban government.

Line quality The internet is actually not so bad in the hotel. Again, this may be because of the very fact that I am in that hotel, and I get a privileged access to the new fiber line to Venezuela, the ALBA-1 link. The line speed I get is around 1mbps, according to speedtest, which selected a server from LIME in George Town, Cayman Islands:

[1034]anarcat@angela:cuba$ speedtest
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Empresa de Telecomunicaciones de Cuba (152.206.92.146)...
Selecting best server based on latency...
Hosted by LIME (George Town) [391.78 km]: 317.546 ms
Testing download speed........................................
Download: 1.01 Mbits/s
Testing upload speed..................................................
Upload: 1.00 Mbits/s

Latency to the rest of the world is of couse slow:

--- koumbit.org ping statistics ---
122 packets transmitted, 120 received, 1,64% packet loss, time 18731,6ms
rtt min/avg/max/sdev = 127,457/156,097/725,211/94,688 ms
--- google.com ping statistics ---
122 packets transmitted, 121 received, 0,82% packet loss, time 19371,4ms
rtt min/avg/max/sdev = 132,517/160,095/724,971/93,273 ms
--- redcta.org.ar ping statistics ---
122 packets transmitted, 120 received, 1,64% packet loss, time 40748,6ms
rtt min/avg/max/sdev = 303,035/339,572/965,092/97,503 ms
--- ccc.de ping statistics ---
122 packets transmitted, 72 received, 40,98% packet loss, time 19560,2ms
rtt min/avg/max/sdev = 244,266/271,670/594,104/61,933 ms

Interestingly, Koumbit is actually the closest host in the above test. It could be that Canadian hosts are less affected by bandwidth problems compared to US hosts because of the embargo.

Network topology The various traceroutes show a fairly odd network topology, but that is typical of what I would described as "colonized internet users", which have layers and layers of NAT and obscure routing that keep them from the real internet. Just like large corporations are implementing NAT in a large scale, Cuba seems to have layers and layers of private RFC 1918 IPv4 space. A typical traceroute starts with:

traceroute to koumbit.net (199.58.80.33), 30 hops max, 60 byte packets
 1  10.156.41.1 (10.156.41.1)  9.724 ms  9.472 ms  9.405 ms
 2  192.168.134.137 (192.168.134.137)  16.089 ms  15.612 ms  15.509 ms
 3  172.31.252.113 (172.31.252.113)  15.350 ms  15.805 ms  15.358 ms
 4  pos6-0-0-agu-cr-1.mpls.enet.cu (172.31.253.197)  15.286 ms  14.832 ms  14.405 ms
 5  172.31.252.29 (172.31.252.29)  13.734 ms  13.685 ms  14.485 ms
 6  200.0.16.130 (200.0.16.130)  14.428 ms  11.393 ms  10.977 ms
 7  200.0.16.74 (200.0.16.74)  10.738 ms  10.019 ms  10.326 ms
 8  ix-11-3-1-0.tcore1.TNK-Toronto.as6453.net (64.86.33.45)  108.577 ms  108.449 ms

Let's take this apart line by line:

 1  10.156.41.1 (10.156.41.1)  9.724 ms  9.472 ms  9.405 ms

This is my local gateway, probably the hotel's wifi router.

 2  192.168.134.137 (192.168.134.137)  16.089 ms  15.612 ms  15.509 ms

This is likely not very far from the local gateway, probably still in Cuba. It in one bit away from the captive portal IP address (see below) so it is very likely related to the captive portal implementation.

 3  172.31.252.113 (172.31.252.113)  15.350 ms  15.805 ms  15.358 ms
 4  pos6-0-0-agu-cr-1.mpls.enet.cu (172.31.253.197)  15.286 ms  14.832 ms  14.405 ms
 5  172.31.252.29 (172.31.252.29)  13.734 ms  13.685 ms  14.485 ms

All those are withing RFC 1918 space. Interestingly, the Cuban DNS servers resolve one of those private IPs as within Cuban space, on line #4. That line is interesting because it reveals the potential use of MPLS.

 6  200.0.16.130 (200.0.16.130)  14.428 ms  11.393 ms  10.977 ms
 7  200.0.16.74 (200.0.16.74)  10.738 ms  10.019 ms  10.326 ms

Those two lines are the only ones that actually reveal that the route belongs in Cuba at all. Both IPs are in a tiny (/24, or 256 IP addresses) network allocated to ETECSA, the state telco in Cuba:

inetnum:     200.0.16/24
status:      allocated
aut-num:     N/A
owner:       EMPRESA DE TELECOMUNICACIONES DE CUBA S.A. (IXP CUBA)
ownerid:     CU-CUBA-LACNIC
responsible: Rafael L pez Guerra
address:     Ave. Independencia y 19 Mayo, s/n,
address:     10600 - La Habana - CH
country:     CU
phone:       +53 7 574242 []
owner-c:     JOQ
tech-c:      JOQ
abuse-c:     JEM52
inetrev:     200.0.16/24
nserver:     NS1.NAP.ETECSA.NET
nsstat:      20160123 AA
nslastaa:    20160123
nserver:     NS2.NAP.ETECSA.NET
nsstat:      20160123 AA
nslastaa:    20160123
created:     20030512
changed:     20140610

Then the last hop:

 8  ix-11-3-1-0.tcore1.TNK-Toronto.as6453.net (64.86.33.45)  108.577 ms  108.449 ms  108.257 ms

...interestingly, lands directly in Toronto, in this case going later to Koumbit but that is the first hop that varies according to the destination, hops 1-7 being a common trunk to all external communications. It's also interesting that this shoves a good 90 milliseconds extra in latency, showing that a significant distance and number of equipment crossed. Yet a single hop is crossed, not showing the intermediate step of the Venezuelan link or any other links for that matter. Something obscure is going on there... Also interesting to note is the traceroute to the redirection host, which is only one hop away:

traceroute to 192.168.134.138 (192.168.134.138), 30 hops max, 60 byte packets
 1  192.168.134.138 (192.168.134.138)  6.027 ms  5.698 ms  5.596 ms

Even though it is not the gateway:

$ ip route
default via 10.156.41.1 dev wlan0  proto static  metric 1024
10.156.41.0/24 dev wlan0  proto kernel  scope link  src 10.156.41.4
169.254.0.0/16 dev wlan0  scope link  metric 1000

This means a very close coordination between the different access points and the captive portal system. Finally, note that there seems to be only three peers to the Cuban internet: Teleglobe, formerly Canadian, now owned by the Indian [[!wiki Tata group]], and Telef nica, the Spanish Telco that colonized most of Latin America's internet, all the way down to Argentina. This is confirmed by my traceroutes, which show traffic to Koumbit going through Tata and Google's going through Telef nica.

Captive portal implementation The captive portal is https://www.portal-wifi-temas.nauta.cu/ (not accessible outside of Cuba) and uses a self-signed certificate. The domain name resolves to 190.6.81.230 in the hotel. Accessing http://1.1.1.1/ gives you a status page which allows you to disconnect from the portal. It actually redirects you to https://192.168.134.138/logout.user. That is also a self-signed, but different certificate. That certificate actually reveals the implication of Gemtek which is a "world-leading provider of Wireless Broadband solutions, offering a wide range of solutions from residential to business". It is somewhat unclear if the implication of Gemtek here is deliberate or a misconfiguration on the part of Cuban officials, especially since the certificate is self-signed and was issued in 2002. It could be, however, a trace of the supposed involvement of China in the development of Cuba's networking systems, although Gemtek is based in Taiwan, and not in the China mainland. That IP, in turn, redirects you to the same portal but in a page that shows you the statistics:

https://www.portal-wifi-temas.nauta.cu/?mac=0024D1717D18&script=logout.user&remain_time=00%3A55%3A52&session_time=00%3A04%3A08&username=151003576287&clientip=10.156.41.21&nasid=Wifi_Memories_Jibacoa&r=ac%2Fpopup

Notice how you see the MAC address of the machine in the URL (randomized, this is not my MAC address), along with the remaining time, session time, client IP and the Wifi access point ESSID. There may be some potential in defrauding the session time there, I haven't tested it directly. Hitting Actualizar redirects you back to the IP address, which redirects you to the right URL on the portal. The "real" logout is at:

http://192.168.134.138/logout.user?cmd=logout

The login is performed against https://www.portal-wifi-temas.nauta.cu/index.php?r=ac/login with a referer of:

https://www.portal-wifi-temas.nauta.cu/?&nasid=Wifi_Memories_Jibacoa&nasip=192.168.134.138&clientip=10.156.41.21&mac=EC:55:F9:C5:F2:55&ourl=http%3a%2f%2fgoogle.ca%2f&sslport=443&lang=en-US%2cen%3bq%3d0.8&lanip=10.156.41.1

Again, notice the information revealed to the central portal.

Equipment and providers I ran Nmap probes against both the captive portal and the redirection host, in the hope of finding out how they were built and if they could reveal the source of the equipment used. The complete nmap probes are available in nmap, but it seems that the captive portal is running some embedded device. It is confusing because the probe for the captive portal responds as if it was the gateway, which blurs even more the distinction between the hotel's gateway and the captive portal. This raises the distinct possibility that all access points are actually captive portal that authenticate to another central server. The nmap traces do show three distinct hosts however:

the captive portal (`www.portal-wifi-temas.nauta.cu`, `190.6.81.230`)

some redirection host (`192.168.134.138`)

the hotel's gateway (`10.156.41.1`)

They do have distinct signatures so the above may be just me misinterpreting traceroute and nmap results. Your comments may help in clarifying the above. Still, the three devices show up as running Linux, in the two last cases versions between `2.4.21` and `2.4.31`. Now, to find out which version of Linux it is running is way more challenging, and it is possible it is just some custom Linux distribution. Indeed, the webserver shows up as `G4200.GSI.2.22.0155` and the SSH server is running `OpenSSH 3.0.2p1`, which is basically prehistoric (2002!) which corroborates the idea that this is some Gemtek embedded device. The fact that those devices are running 14 years old software should be a concern to the people responsible for those networks. There is, for example, a remote root vulnerability that affects that specific version of OpenSSH, among many other vulnerabilities.

A note on Nauta card's security Finally, one can note that it is probably trivial to guess card UIDs. All cards i have here start with the prefix 15100, the following digits being 3576 or 4595, presumably depending on the "batch" that was sent to different hotels, which seems to be batches of 1000 cards. You can also correlate the UID with the date at which the card was issued. For example, 15100357XXX cards are all valid until 19/03/2017, and 151004595XXX cards are all valid until 23/03/2017. Here's the list of UIDs I have seen:

151004595313
151004595974
151003576287
151003576105
151003576097

The passwords, on the other hand, do seem fairly random (although my sample size is small). Interestingly, those passwords are also 12 digits long, which is about as strong as a seven-letter password (mixed uppercase and lowercase). If there are no rate-limiting provisions on that captive portal, it could be possible to guess those passwords, since you have free rein on accessing those routers. Depending on the performance of the routers, you could be lucky and find a working password for free...

Conclusion Clearly, Internet access in Cuba needs to be modernized. We can clearly see that Cuba years behind the rest of the Americas, if only through the percentage of the population with internet access, or download speeds. The existence of a centralized captive portal also enables a huge surveillance potential that should be a concern for any Cuban, or for that matter, anyone wishing to live in a free society. The answer, however, lies not in the liberalization of commerce and opening the doors to the US companies and their own systems of surveillance. It should be possible, and even desirable for Cubans to establish their own neutral network, a proposal I have made in the past even for here in Qu bec. This network could be used and improved by Cubans themselves, prioritizing local communities that would establish their own infrastructure according to their own needs. I have been impressed by this article about the El Paquete system - it shows great innovation and initiative from Cubans which are known for engaging in technology in a creative way. This should be leveraged by letting Cubans do what they want with their networks, not telling them what to do. The best the Googles of this world can do to help Cuba is not to colonize Cuba's technological landscape but to cleanup their own and make their own tools more easily accessible and shareable offline. It is something companies can do right now, something I detailed in a previous article.

I m organizing an event at the University of Washington in Seattle that involves a reading, the screening of a documentary film, and a Q&A about Aaron Swartz. The event coincides with the third anniversary of Aaron s death and the release of a new book of Swartz s writing that I contributed to.

The event is free and open the public and details are below:

WHEN: Wednesday, January 13 at 6:30-9:30 p.m.

WHERE: Communications Building (CMU) 120, University of Washington

We invite you to celebrate the life and activism efforts of Aaron Swartz, hosted by UW Communication professor Benjamin Mako Hill. The event is next week and will consist of a short book reading, a screening of a documentary about Aaron s life, and a Q&A with Mako who knew Aaron well details are below. No RSVP required; we hope you can join us.

Aaron Swartz was a programming prodigy, entrepreneur, and information activist who contributed to the core Internet protocol RSS and co-founded Reddit, among other groundbreaking work. However, it was his efforts in social justice and political organizing combined with his aggressive approach to promoting increased access to information that entangled him in a two-year legal nightmare that ended with the taking of his own life at the age of 26.

January 11, 2016 marks the third anniversary of his death. Join us two days later for a reading from a new posthumous collection of Swartz s writing published by New Press, a showing of The Internet s Own Boy (a documentary about his life), and a Q&A with UW Communication professor Benjamin Mako Hill a former roommate and friend of Swartz and a contributor to and co-editor of the first section of the new book. If you re not in Seattle, there are events with similar programs being organized in Atlanta, Chicago, Dallas, New York, and San Francisco. All of these other events will be on Monday January 11 and registration is required for all of them. I will be speaking at the event in San Francisco.

The New Press has published a new collection of Aaron Swartz s writing called The Boy Who Could Change the World: The Writings of Aaron Swartz. I worked with Seth Schoen to introduce and help edit the opening section of book that includes Aaron s writings on free culture, access to information and knowledge, and copyright. Seth and I have put our introduction online under an appropriately free license (CC BY-SA).

Over the last week, I ve read the whole book again. I think the book really is a wonderful snapshot of Aaron s thought and personality. It s got bits that make me roll my eyes, bits that make me want to shout in support, and bits that continue to challenge me. It all makes me miss Aaron terribly. I strongly recommend the book. Because the publication is post-humous, it s meant that folks like me are doing media work for the book. In honor of naming the book their progressive pick of the week, Truthout has also published an interview with me about Aaron and the book. Other folks who introduced and/or edited topical sections in the book are David Auerbach (Computers), David Segal (Politics), Cory Doctorow (Media), James Grimmelmann (Books and Culture), and Astra Taylor (Unschool). The book is introduced by Larry Lessig.

At LibrePlanet 2015 (the FSF s annual conference), I gave a talk called Access Without Empowerment as one of the conference keynote addresses. As I did for my 2013 LibrePlanet talk, I ve edited together a version that includes the slides and I ve posted it online in WebM and on YouTube.

Here s the summary written up in the LibrePlanet program:

The free software movement has twin goals: promoting access to software through users freedom to share, and empowering users by giving them control over their technology. For all our movement s success, we have been much more successful at the former. I will use data from free software and from several related movements to explain why promoting empowerment is systematically more difficult than promoting access and I will explore how our movement might address the second challenge in the future.

In related news, registration is open for LibrePlanet 2016 and that it s free for FSF members. If you re not an FSF member, the FSF annual fundraiser is currently going on so now would be a great time to join.

Search Results: "teo"

13 March 2017

Michal Čihař: Weblate users survey

3 March 2017

Michal Čihař: Weblate 2.12

17 February 2017

Michal Čihař: What's coming in Weblate 2.12

3 February 2017

Benjamin Mako Hill: New Dataset: Five Years of Longitudinal Data from Scratch

31 January 2017

Michal Čihař: Weblate 2.11

Benjamin Mako Hill: Supporting children in doing data science

12 December 2016

Michal Čihař: New location for Weblate

26 October 2016

Christoph Egger: Installing a python systemd service?

8 July 2016

Mike Hommey: Are all integer overflows equal?

Michal Čihař: wlc 0.4

4 July 2016

Benjamin Mako Hill: Studying the relationship between remixing & learning

17 May 2016

Reproducible builds folks: Reproducible builds: week 55 in Stretch cycle

11 May 2016

Elena 'valhalla' Grandi: GnuPG Crowdfunding and sticker

2 May 2016

Reproducible builds folks: Reproducible builds: week 53 in Stretch cycle

12 February 2016

Benjamin Mako Hill: Unhappy Birthday Suspended

4 February 2016

Benjamin Mako Hill: Welcome Back Poster

25 January 2016

Antoine Beaupr : Internet in Cuba

Network configuration assessment This section is more technical and delves more deeply in the Cuban internet to analyze the quality and topology of the network, along with hints as to which hardware and providers are being used to support the Cuban government.

5 January 2016

Benjamin Mako Hill: Celebrate Aaron Swartz in Seattle (or Atlanta, Chicago, Dallas, NYC, SF)

4 January 2016

Benjamin Mako Hill: The Boy Who Could Change the World: The Writings of Aaron Swartz

3 January 2016

Benjamin Mako Hill: Access Without Empowerment (LibrePlanet 2015 Keynote)